Load_the_Data
Summary
## [1] 89206 30
## [1] "Pnr.Creat.Date" "month_pnr_create"
## [3] "Rloc" "Bkg.User.Id"
## [5] "Collect.Method" "Curr.First.Dep.Apt"
## [7] "numberOfDay" "month_dep"
## [9] "from_1st" "to_1st"
## [11] "from_2nd" "to_2nd"
## [13] "from_3rd" "to_3rd"
## [15] "destination" "type"
## [17] "Season" "Curr.First.Dep.Date"
## [19] "Pnr.Pax.Count" "Lang"
## [21] "Pnr.Curr.Code" "Pnr.Fare.Amt"
## [23] "Bkg.Subclass" "Bkg_Subclass"
## [25] "Net.Net.Rev" "Sales.Promo"
## [27] "first_ticket_O" "first_ticket_D"
## [29] "Route" "Pnr.Creat.Pseudo.City.Code"
## 'data.frame': 89206 obs. of 30 variables:
## $ Pnr.Creat.Date : Factor w/ 365 levels "1/1/15","1/10/15",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ month_pnr_create : Factor w/ 12 levels ".01.",".02.",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Rloc : Factor w/ 88689 levels "2.34E+08","211 (Nov)DG",..: 62064 61972 62040 62067 61906 61917 62199 61948 62019 62068 ...
## $ Bkg.User.Id : Factor w/ 20257 levels "1000054769","1000055266",..: 8947 14163 20184 10839 19254 6424 9034 19461 4291 20230 ...
## $ Collect.Method : Factor w/ 2 levels "","ET": 2 2 2 2 2 2 2 2 2 2 ...
## $ Curr.First.Dep.Apt : Factor w/ 126 levels "ADL","AMS","ARN",..: 92 58 58 92 58 58 58 58 58 10 ...
## $ numberOfDay : int 177 85 48 44 10 22 11 109 100 95 ...
## $ month_dep : Factor w/ 12 levels ".01.",".02.",..: 6 3 2 2 1 1 1 4 4 4 ...
## $ from_1st : Factor w/ 66 levels "ADL","AMS","ARN",..: 53 53 53 53 53 53 53 53 1 11 ...
## $ to_1st : Factor w/ 107 levels "ADL","AMS","ARN",..: 68 94 68 68 94 94 94 68 49 49 ...
## $ from_2nd : Factor w/ 88 levels "","AMS","BKK",..: 44 77 44 44 77 77 77 60 44 44 ...
## $ to_2nd : Factor w/ 110 levels "","ADL","AMS",..: 80 52 80 80 52 52 52 52 2 110 ...
## $ from_3rd : Factor w/ 62 levels "","DOH","FCO",..: 25 1 25 25 1 1 1 1 1 1 ...
## $ to_3rd : Factor w/ 99 levels "","ADL","AMS",..: 44 1 44 44 1 1 1 1 1 1 ...
## $ destination : Factor w/ 95 levels "ADL","AMS","BKK",..: 43 43 43 43 43 43 43 43 1 95 ...
## $ type : Factor w/ 3 levels "Muli","Oneway",..: 3 3 3 3 3 3 3 3 3 2 ...
## $ Season : Factor w/ 4 levels "Season1","Season2",..: 2 1 1 1 1 1 1 2 2 2 ...
## $ Curr.First.Dep.Date : Factor w/ 702 levels "1-Apr-15","1-Apr-16",..: 448 450 212 120 53 352 76 276 46 609 ...
## $ Pnr.Pax.Count : int 1 1 1 1 1 1 1 1 1 2 ...
## $ Lang : Factor w/ 14 levels "CN","DE","FR",..: 4 4 4 1 4 4 4 4 4 4 ...
## $ Pnr.Curr.Code : Factor w/ 2 levels "","AUD": 2 2 2 2 2 2 2 2 2 2 ...
## $ Pnr.Fare.Amt : num 2879 1789 1748 2432 2394 ...
## $ Bkg.Subclass : Factor w/ 297 levels "B","B, C","B, E",..: 11 28 28 34 47 47 47 80 80 80 ...
## $ Bkg_Subclass : Factor w/ 20 levels "B","C","D","E",..: 1 1 1 1 1 1 1 4 4 4 ...
## $ Net.Net.Rev : num 1866 1127 922 1503 1599 ...
## $ Sales.Promo : Factor w/ 27 levels "","AUSHKGYOW5 (May)15",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ first_ticket_O : Factor w/ 66 levels "ADL","AMS","ARN",..: 53 53 53 53 53 53 53 53 1 11 ...
## $ first_ticket_D : Factor w/ 107 levels "ADL","AMS","ARN",..: 68 94 68 68 94 94 94 68 49 49 ...
## $ Route : Factor w/ 3738 levels "ADLHKG","ADLHKG, AMSHKG, DOHBCN, HKGADL, HKGDOH",..: 2897 3659 2897 2897 3659 3659 3659 3050 112 1035 ...
## $ Pnr.Creat.Pseudo.City.Code: Factor w/ 3 levels "SYDCX08AA","SYDCX08CP",..: 1 1 1 1 1 1 1 1 1 1 ...
## [1] "CN" "DE" "FR" "GB" "ID" "IT" "JP" "KO" "KR" "NL" "RE" "RU" "TH" "TW"
## [1] "ADL" "AMS" "ARN" "ATH" "BAH" "BCN" "BJS" "BKI" "BKK" "BLR" "BNE"
## [12] "BOM" "BOS" "BUD" "CAN" "CCU" "CDG" "CEB" "CGK" "CGO" "CJU" "CKG"
## [23] "CMB" "CNS" "CNX" "CPH" "CPT" "CRK" "CSX" "CTS" "CTU" "DAC" "DAD"
## [34] "DEL" "DFW" "DLC" "DME" "DOH" "DPS" "DUR" "DUS" "DXB" "EWR" "FCO"
## [45] "FOC" "FRA" "FUK" "HAK" "HAN" "HGH" "HIJ" "HKD" "HKG" "HND" "JFK"
## [56] "KHH" "KUL" "MEL" "MLE" "NGB" "PEK" "PER" "PVG" "SYD" "TPE" "XIY"
## [1] "ADL" "AMS" "ARN" "ATH" "BAH" "BCN" "BKI" "BKK" "BLR" "BNE" "BOM"
## [12] "BOS" "BUD" "CAN" "CCU" "CDG" "CEB" "CGK" "CGO" "CJU" "CKG" "CMB"
## [23] "CNS" "CNX" "CPH" "CRK" "CSX" "CTS" "CTU" "DAC" "DAD" "DEL" "DFW"
## [34] "DLC" "DME" "DOH" "DPS" "DUS" "DXB" "EWR" "FCO" "FOC" "FRA" "FUK"
## [45] "GVA" "HAK" "HAN" "HGH" "HKG" "HKT" "HND" "HYD" "ICN" "IST" "JFK"
## [56] "JNB" "KHH" "KIX" "KMG" "KTM" "KUL" "KWL" "LAX" "LHR" "MAA" "MAD"
## [67] "MAN" "MEL" "MLE" "MNL" "MXP" "NGB" "NGO" "NKG" "NRT" "OKA" "ORD"
## [78] "PAR" "PEK" "PEN" "PER" "PNH" "PUS" "PVG" "REP" "RGN" "RMQ" "RUH"
## [89] "SFO" "SGN" "SHA" "SIN" "SUB" "SYD" "SYX" "TAO" "TPE" "TSN" "USM"
## [100] "VCE" "WNZ" "WUH" "XIY" "XMN" "YVR" "YYZ" "ZRH"
## [1] "ADL" "AMS" "BKK" "BLR" "BNE" "BOM" "BOS" "CAN" "CCU" "CDG" "CEB"
## [12] "CGK" "CGO" "CJU" "CKG" "CMB" "CNS" "CNX" "CPT" "CRK" "CSX" "CTS"
## [23] "CTU" "DAC" "DAD" "DEL" "DFW" "DLC" "DME" "DOH" "DPS" "DUR" "DUS"
## [34] "DXB" "EWR" "FCO" "FOC" "FRA" "FUK" "HAK" "HAN" "HGH" "HKG" "HKT"
## [45] "HND" "HYD" "ICN" "JFK" "JMU" "JNB" "KHH" "KIX" "KMG" "KTM" "KUL"
## [56] "KWL" "LAX" "LHR" "MAA" "MAD" "MAN" "MEL" "MLE" "MNL" "MXP" "NGB"
## [67] "NGO" "NKG" "NRT" "OKA" "ORD" "PEK" "PER" "PNH" "PUS" "PVG" "RMQ"
## [78] "RUH" "SFO" "SGN" "SHA" "SIN" "SYD" "SYX" "TAO" "TPE" "TSN" "VVO"
## [89] "WNZ" "WUH" "XIY" "XMN" "YVR" "YYZ" "ZRH"
## [1] "B" "C" "D" "E" "F" "H" "I" "J" "K" "L" "M" "N" "Q" "R" "S" "T" "U"
## [18] "V" "W" "Y"
## Pnr.Creat.Date month_pnr_create Rloc Bkg.User.Id
## 15/10/15: 697 .10. :11437 3WVWN4 : 3 GUEST : 487
## 19/2/15 : 633 .02. :10858 222MBM : 2 GUEST366: 93
## 16/10/15: 617 .03. : 9827 224FOR : 2 GUEST765: 87
## 9/2/15 : 608 .11. : 7784 22K5RR : 2 GUEST392: 85
## 19/10/15: 589 .04. : 7401 23TSE8 : 2 GUEST456: 85
## 17/2/15 : 567 .09. : 6800 242SJ6 : 2 GUEST457: 85
## (Other) :85495 (Other):35099 (Other):89193 (Other) :88284
## Collect.Method Curr.First.Dep.Apt numberOfDay month_dep
## : 4 HKG :23418 Min. :-18.00 .06. : 8921
## ET:89202 SYD :14016 1st Qu.: 29.00 .05. : 8794
## MEL : 9756 Median : 74.00 .11. : 8117
## PER : 7809 Mean : 97.35 .04. : 8085
## BNE : 4355 3rd Qu.:144.00 .03. : 8072
## ADL : 4004 Max. :492.00 .12. : 8061
## (Other):25848 (Other):39156
## from_1st to_1st from_2nd to_2nd
## HKG :57990 HKG :31132 HKG :52809 HKG :31092
## BNE : 8605 SYD :14836 SYD :13890 SYD : 8652
## ADL : 7148 MEL :13194 MEL : 8953 BNE : 6077
## CNS : 1917 PER : 7634 PER : 5414 : 5261
## SYD : 1762 PEK : 2914 : 5261 PER : 5174
## MEL : 1275 ICN : 2536 CNS : 229 ADL : 4909
## (Other):10509 (Other):16960 (Other): 2650 (Other):28041
## from_3rd to_3rd destination type
## :40307 :40307 HKG :79862 Muli :19930
## HKG :19674 HKG :29050 BNE : 2164 Oneway: 7771
## MEL : 6228 SYD : 4661 ADL : 1684 return:61505
## PER : 3554 MEL : 2740 CNS : 542
## SYD : 2964 PER : 1624 SYD : 374
## ICN : 2363 PVG : 1363 PEK : 350
## (Other):14116 (Other): 9461 (Other): 4230
## Season Curr.First.Dep.Date Pnr.Pax.Count Lang
## Season1:20096 26-Jul-15: 386 Min. :1.000 GB :77207
## Season2:25800 23-Dec-15: 347 1st Qu.:1.000 CN : 5760
## Season3:20292 23-Nov-15: 343 Median :1.000 TW : 5179
## Season4:23018 31-Jan-16: 341 Mean :1.373 KO : 619
## 1-Feb-16 : 337 3rd Qu.:2.000 JP : 220
## 2-Aug-15 : 323 Max. :6.000 DE : 68
## (Other) :87129 (Other): 153
## Pnr.Curr.Code Pnr.Fare.Amt Bkg.Subclass Bkg_Subclass
## : 320 Min. : 0.0 V :29638 V :29968
## AUD:88886 1st Qu.: 938.7 S : 9908 L :14389
## Median : 1365.3 L, V : 8802 S : 9922
## Mean : 1979.2 M, V : 5715 M : 9171
## 3rd Qu.: 2196.4 E : 3213 K : 6999
## Max. :33517.4 K, V : 3149 E : 5718
## (Other):28781 (Other):13039
## Net.Net.Rev Sales.Promo first_ticket_O
## Min. : 15.93 :88721 HKG :57990
## 1st Qu.: 421.92 RUGBY7AU : 326 BNE : 8605
## Median : 612.36 AUSHKGYOW5 (May)15: 49 ADL : 7148
## Mean : 950.15 MICE549 : 27 CNS : 1917
## 3rd Qu.: 1009.42 MICE629 : 15 SYD : 1762
## Max. :20161.28 MICE989 : 15 MEL : 1275
## NA's :26605 (Other) : 53 (Other):10509
## first_ticket_D Route Pnr.Creat.Pseudo.City.Code
## HKG :31132 HKGSYD, SYDHKG:11697 SYDCX08AA:88597
## SYD :14836 HKGMEL, MELHKG: 6946 SYDCX08CP: 330
## MEL :13194 HKGPER, PERHKG: 4034 SYDCX08PA: 279
## PER : 7634 BNEHKG, HKGBNE: 2001
## PEK : 2914 SYDHKG : 1762
## ICN : 2536 ADLHKG, HKGADL: 1657
## (Other):16960 (Other) :61109
Most passengers buy return tickets flight from HKG in english version. For each transaction , the median of the passenger is one. The amount of fare from AUD 0 to AUD 33,517.4 There is 61505 is return ticket. Some of transaction duplicate that the number of RLOC is more than one.
Remove all duplicate row
## [1] 88356 30
## Pnr.Creat.Date month_pnr_create Rloc Bkg.User.Id
## 15/10/15: 692 .10. :11360 2.34E+08 : 1 GUEST : 486
## 19/2/15 : 628 .02. :10755 211 (Nov)DG: 1 GUEST366: 91
## 16/10/15: 614 .03. : 9716 22 (Feb)V2 : 1 GUEST765: 87
## 9/2/15 : 602 .11. : 7725 2222MP : 1 GUEST392: 85
## 19/10/15: 579 .04. : 7297 2224JW : 1 GUEST456: 84
## 18/2/15 : 563 .09. : 6748 2224UD : 1 GUEST457: 84
## (Other) :84678 (Other):34755 (Other) :88350 (Other) :87439
## Collect.Method Curr.First.Dep.Apt numberOfDay month_dep
## : 4 HKG :23142 Min. : 0.0 .06. : 8845
## ET:88352 SYD :13935 1st Qu.: 29.0 .05. : 8725
## MEL : 9685 Median : 74.0 .03. : 8039
## PER : 7779 Mean : 97.2 .11. : 8035
## BNE : 4335 3rd Qu.:144.0 .04. : 8008
## ADL : 3968 Max. :492.0 .12. : 7967
## (Other):25512 (Other):38737
## from_1st to_1st from_2nd to_2nd
## HKG :57459 HKG :30813 HKG :52436 HKG :30760
## BNE : 8538 SYD :14728 SYD :13763 SYD : 8585
## ADL : 7085 MEL :13098 MEL : 8865 BNE : 6043
## CNS : 1903 PER : 7580 PER : 5361 PER : 5151
## SYD : 1710 PEK : 2881 : 5117 : 5117
## MEL : 1254 ICN : 2515 CTU : 228 ADL : 4872
## (Other):10407 (Other):16741 (Other): 2586 (Other):27828
## from_3rd to_3rd destination type
## :39775 :39775 HKG :79140 Muli :19786
## HKG :19531 HKG :28876 BNE : 2145 Oneway: 7578
## MEL : 6194 SYD : 4628 ADL : 1673 return:60992
## PER : 3535 MEL : 2705 CNS : 536
## SYD : 2952 PER : 1619 SYD : 350
## ICN : 2353 PVG : 1356 PEK : 345
## (Other):14016 (Other): 9397 (Other): 4167
## Season Curr.First.Dep.Date Pnr.Pax.Count Lang
## Season1:19969 26-Jul-15: 380 Min. :1.000 GB :76459
## Season2:25578 23-Dec-15: 345 1st Qu.:1.000 CN : 5705
## Season3:20056 23-Nov-15: 342 Median :1.000 TW : 5139
## Season4:22753 31-Jan-16: 341 Mean :1.366 KO : 615
## 1-Feb-16 : 337 3rd Qu.:2.000 JP : 217
## 2-Aug-15 : 319 Max. :6.000 DE : 68
## (Other) :86292 (Other): 153
## Pnr.Curr.Code Pnr.Fare.Amt Bkg.Subclass Bkg_Subclass
## : 0 Min. : 0.0 V :29281 V :29611
## AUD:88356 1st Qu.: 939.9 S : 9827 L :14271
## Median : 1362.7 L, V : 8735 S : 9841
## Mean : 1976.5 M, V : 5665 M : 9071
## 3rd Qu.: 2189.9 E : 3185 K : 6955
## Max. :33517.4 K, V : 3132 E : 5678
## (Other):28531 (Other):12929
## Net.Net.Rev Sales.Promo first_ticket_O
## Min. : 15.93 :87872 HKG :57459
## 1st Qu.: 422.81 RUGBY7AU : 325 BNE : 8538
## Median : 613.78 AUSHKGYOW5 (May)15: 49 ADL : 7085
## Mean : 952.77 MICE549 : 27 CNS : 1903
## 3rd Qu.: 1012.07 MICE629 : 15 SYD : 1710
## Max. :20161.28 MICE989 : 15 MEL : 1254
## NA's :26392 (Other) : 53 (Other):10407
## first_ticket_D Route Pnr.Creat.Pseudo.City.Code
## HKG :30813 HKGSYD, SYDHKG:11632 SYDCX08AA:87748
## SYD :14728 HKGMEL, MELHKG: 6907 SYDCX08CP: 330
## MEL :13098 HKGPER, PERHKG: 4011 SYDCX08PA: 278
## PER : 7580 BNEHKG, HKGBNE: 1988
## PEK : 2881 SYDHKG : 1710
## ICN : 2515 ADLHKG, HKGADL: 1649
## (Other):16741 (Other) :60459
Suppose the date of booking is earlier than date of flight and the currency is AUD, i.e.numberOfDay >= 0 and Pnr.Curr.Code==‘AUD’ .Also , there is no duplicated RLOC(duplicated data). 850 rows is removed from the dataset. After cleared data, there is 88356 rows
Pnr.Fare.Amt
Most tickets is around AUD 1000
Pnr.Fare.Amt
##
## FALSE TRUE
## 88234 122
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 939.9 1363.0 1976.0 2190.0 33520.0
There were 122 transctions are free of charge in 2015. The mean of each transction is AUD 1976. The median of each transction is AUD 1363.
Net.Net.Rev
Most of the net revenue for each transaction is around AUD 500
Net.Net.Rev
##
## FALSE
## 61964
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 15.93 422.80 613.80 952.80 1012.00 20160.00 26392
There is 26392 which do not have record. The median of the net revenue is AUD 613.8
Number.of.day
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Most passengers will plan the trip around 3 months before.
Number.of.day
##
## FALSE TRUE
## 87807 549
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 29.0 74.0 97.2 144.0 492.0
There is 549 passengers bought the ticket shortly.
I am wonder why there is so many passengers flight from HK and if they flight to Australia.
Origin = HKG
## [1] 57459 30
## Pnr.Creat.Date month_pnr_create Rloc Bkg.User.Id
## 15/10/15: 518 .02. : 7502 211 (Nov)DG: 1 GUEST : 334
## 19/2/15 : 472 .10. : 7219 22 (Feb)V2 : 1 GUEST366: 68
## 9/2/15 : 446 .03. : 6259 2224JW : 1 GUEST001: 58
## 18/2/15 : 441 .11. : 5023 2224UD : 1 GUEST392: 57
## 16/10/15: 433 .04. : 4733 2224VP : 1 GUEST748: 57
## 17/2/15 : 419 .05. : 4340 22282J : 1 GUEST227: 56
## (Other) :54730 (Other):22383 (Other) :57453 (Other) :56829
## Collect.Method Curr.First.Dep.Apt numberOfDay month_dep
## : 4 HKG :18558 Min. : 0.00 .05. : 5656
## ET:57455 SYD :10242 1st Qu.: 29.00 .06. : 5512
## MEL : 7116 Median : 71.00 .11. : 5477
## PER : 6257 Mean : 94.87 .04. : 5368
## PEK : 2301 3rd Qu.:138.00 .12. : 5282
## PVG : 1963 Max. :491.00 .03. : 5216
## (Other):11022 (Other):24948
## from_1st to_1st from_2nd to_2nd
## HKG :57459 SYD :14728 HKG :28593 HKG :28564
## ADL : 0 MEL :13098 SYD :13760 SYD : 8324
## AMS : 0 PER : 7580 MEL : 8865 PER : 5067
## ARN : 0 PEK : 2881 PER : 5360 MEL : 3113
## ATH : 0 ICN : 2515 : 294 TPE : 2655
## BAH : 0 LHR : 2496 PEK : 120 PVG : 1962
## (Other): 0 (Other):14161 (Other): 467 (Other): 7774
## from_3rd to_3rd destination type
## :28548 HKG :28687 HKG :56993 Muli : 166
## MEL : 6163 :28548 SYD : 127 Oneway: 300
## PER : 3515 TPE : 120 MEL : 70 return:56993
## SYD : 2916 YVR : 34 ICN : 63
## ICN : 2349 SYD : 22 NRT : 48
## PEK : 2294 PER : 8 PER : 36
## (Other):11674 (Other): 40 (Other): 122
## Season Curr.First.Dep.Date Pnr.Pax.Count Lang
## Season1:12542 26-Jul-15: 254 Min. :1.000 GB :50412
## Season2:16536 23-Nov-15: 250 1st Qu.:1.000 CN : 3297
## Season3:13008 1-Feb-16 : 233 Median :1.000 TW : 3048
## Season4:15373 8-Jun-15 : 233 Mean :1.341 KO : 423
## 26-Jan-16: 228 3rd Qu.:1.000 JP : 177
## 12-Dec-15: 222 Max. :6.000 IT : 30
## (Other) :56039 (Other): 72
## Pnr.Curr.Code Pnr.Fare.Amt Bkg.Subclass Bkg_Subclass
## : 0 Min. : 0.0 V :18915 V :19169
## AUD:57459 1st Qu.: 940.9 S : 7177 L : 9241
## Median : 1347.9 L, V : 5755 S : 7188
## Mean : 1923.0 M, V : 3822 M : 5917
## 3rd Qu.: 2137.2 K, V : 2127 K : 4573
## Max. :33517.4 L : 1846 E : 3565
## (Other):17817 (Other): 7806
## Net.Net.Rev Sales.Promo first_ticket_O first_ticket_D
## Min. : 15.93 :57134 HKG :57459 SYD :14728
## 1st Qu.: 430.92 RUGBY7AU: 227 ADL : 0 MEL :13098
## Median : 620.50 MICE549 : 26 AMS : 0 PER : 7580
## Mean : 942.71 MICE629 : 15 ARN : 0 PEK : 2881
## 3rd Qu.: 1010.91 MICE989 : 13 ATH : 0 ICN : 2515
## Max. :17994.10 MICE700 : 10 BAH : 0 LHR : 2496
## NA's :16265 (Other) : 34 (Other): 0 (Other):14161
## Route Pnr.Creat.Pseudo.City.Code
## HKGSYD, SYDHKG :11632 SYDCX08AA:56990
## HKGMEL, MELHKG : 6907 SYDCX08CP: 226
## HKGPER, PERHKG : 4011 SYDCX08PA: 243
## HKGPEK, HKGSYD, PEKHKG, SYDHKG: 1363
## HKGLHR, HKGSYD, LHRHKG, SYDHKG: 1160
## HKGMEL, HKGPEK, MELHKG, PEKHKG: 1022
## (Other) :31364
##
## .01. .02. .03. .04. .05. .06. .07. .08. .09. .10. .11. .12.
## 3819 3507 5216 5368 5656 5512 4898 4192 3918 4614 5477 5282
##
## B C D E F H I J K L M N
## 1083 93 211 3565 1 2180 1698 15 4573 9241 5917 1913
## Q R S T U V W Y
## 0 353 7188 155 59 19169 26 19
There is 57459 bookings(65%) paid in AUD flight from HKG. I am so surprise that the popular months is not February and August (the school starting day) and the popular booking class is not business class (for business trip)
Origin = HKG (in month of departure)
The popular city( 1st destination) are PAR(Paris), SHA(Shanghai)and YVR(Vancouver), all of that are commercial city.
create AU table
Origin = HKG , destination = AU
## [1] 35628 30
## to_1st Pnr.Creat.Date month_pnr_create Rloc
## SYD :14728 15/10/15: 330 .10. : 4635 211 (Nov)DG: 1
## MEL :13098 19/2/15 : 298 .02. : 4392 22 (Feb)V2 : 1
## PER : 7580 16/10/15: 265 .03. : 3511 2224JW : 1
## BNE : 189 17/2/15 : 264 .11. : 3117 2224VP : 1
## CNS : 33 9/2/15 : 264 .09. : 2888 2228IG : 1
## ADL : 0 19/10/15: 258 .04. : 2867 222A5C : 1
## (Other): 0 (Other) :33949 (Other):14218 (Other) :35622
## Bkg.User.Id Collect.Method Curr.First.Dep.Apt numberOfDay
## GUEST : 255 : 3 HKG :15665 Min. : 0.00
## GUEST305: 40 ET:35625 SYD : 5551 1st Qu.: 28.00
## GUEST366: 40 MEL : 4351 Median : 65.00
## GUEST909: 37 PER : 2898 Mean : 89.83
## GUEST546: 36 TPE : 1409 3rd Qu.:127.00
## GUEST582: 36 PVG : 1306 Max. :491.00
## (Other) :35184 (Other): 4448
## month_dep from_1st from_2nd to_2nd
## .11. : 3734 HKG :35628 HKG :11903 HKG :23472
## .03. : 3350 ADL : 0 SYD :11785 TPE : 2644
## .12. : 3276 AMS : 0 MEL : 7082 PVG : 1940
## .04. : 3268 ARN : 0 PER : 4105 PEK : 1141
## .05. : 3233 ATH : 0 : 251 XMN : 953
## .06. : 3147 BAH : 0 PEK : 95 WUH : 772
## (Other):15620 (Other): 0 (Other): 407 (Other): 4706
## from_3rd to_3rd destination type
## :23493 :23493 HKG :35217 Muli : 159
## MEL : 5709 HKG :12101 SYD : 123 Oneway: 252
## PER : 3216 YVR : 7 MEL : 69 return:35217
## SYD : 2527 SYD : 5 ICN : 61
## ICN : 86 TPE : 5 NRT : 46
## PEK : 85 PEK : 4 PER : 36
## (Other): 512 (Other): 13 (Other): 76
## Season Curr.First.Dep.Date Pnr.Pax.Count Lang
## Season1: 8084 23-Nov-15: 178 Min. :1.000 GB :31276
## Season2: 9648 2-Feb-16 : 177 1st Qu.:1.000 TW : 2319
## Season3: 7888 26-Jul-15: 172 Median :1.000 CN : 1945
## Season4:10008 26-Jan-16: 171 Mean :1.293 KO : 37
## 12-Dec-15: 165 3rd Qu.:1.000 JP : 18
## 1-Feb-16 : 163 Max. :6.000 IT : 13
## (Other) :34602 (Other): 20
## Pnr.Curr.Code Pnr.Fare.Amt Bkg.Subclass Bkg_Subclass
## : 0 Min. : 0.0 V :11771 V :11991
## AUD:35628 1st Qu.: 926.7 S : 4224 L : 5809
## Median : 1289.4 L, V : 3793 S : 4228
## Mean : 1775.3 M, V : 2689 M : 3744
## 3rd Qu.: 1944.3 K, V : 1623 K : 3191
## Max. :27202.2 L, M : 1034 E : 2124
## (Other):10494 (Other): 4541
## Net.Net.Rev Sales.Promo first_ticket_O first_ticket_D
## Min. : 15.93 :35346 HKG :35628 SYD :14728
## 1st Qu.: 440.00 RUGBY7AU: 189 ADL : 0 MEL :13098
## Median : 604.66 MICE549 : 26 AMS : 0 PER : 7580
## Mean : 889.32 MICE629 : 15 ARN : 0 BNE : 189
## 3rd Qu.: 965.54 MICE989 : 12 ATH : 0 CNS : 33
## Max. :17947.72 MICE700 : 9 BAH : 0 ADL : 0
## NA's :9633 (Other) : 31 (Other): 0 (Other): 0
## Route Pnr.Creat.Pseudo.City.Code
## HKGSYD, SYDHKG :11632 SYDCX08AA:35172
## HKGMEL, MELHKG : 6907 SYDCX08CP: 226
## HKGPER, PERHKG : 4011 SYDCX08PA: 230
## HKGMEL, HKGPEK, MELHKG, PEKHKG: 1022
## HKGPER, HKGPVG, PERHKG, PVGHKG: 933
## HKGSYD, HKGTPE, SYDHKG, TPEHKG: 826
## (Other) :10297
There is 35628 bookings paid in AUD flight from HKG to AU. It is not surprise that the most popular city are SYD(Sydney) , PER(Perth) and MEL(Melbourne) , are main cities in Australia.
Origin = AU
## [1] 14089 30
## from_1st Pnr.Creat.Date month_pnr_create Rloc
## BNE :8538 15/10/15: 84 .10. :1692 2.34E+08: 1
## CNS :1903 26/10/15: 84 .03. :1580 2228AK : 1
## SYD :1710 19/10/15: 77 .02. :1517 222DYF : 1
## MEL :1254 16/10/15: 75 .11. :1216 2239AK : 1
## PER : 684 9/2/15 : 74 .04. :1158 2239VA : 1
## ADL : 0 12/2/15 : 72 .05. :1127 223TYU : 1
## (Other): 0 (Other) :13623 (Other):5799 (Other) :14083
## Bkg.User.Id Collect.Method Curr.First.Dep.Apt numberOfDay
## GUEST : 80 : 0 BNE :4212 Min. : 0.0
## GUEST315: 22 ET:14089 HKG :2208 1st Qu.: 16.0
## GUEST515: 21 SYD :1743 Median : 51.0
## GUEST907: 21 MEL :1218 Mean : 81.6
## GUEST097: 20 CNS :1203 3rd Qu.:124.0
## GUEST583: 20 PER : 681 Max. :469.0
## (Other) :13905 (Other):2824
## month_dep to_1st from_2nd to_2nd
## .06. :1472 HKG :14089 HKG :8608 BNE :6000
## .05. :1411 ADL : 0 :4412 :4412
## .03. :1405 AMS : 0 CTU : 163 CNS :1075
## .04. :1296 ARN : 0 HGH : 152 HKG :1058
## .11. :1274 ATH : 0 CDG : 107 PVG : 159
## .07. :1178 BAH : 0 FCO : 88 LHR : 152
## (Other):6053 (Other): 0 (Other): 559 (Other):1233
## from_3rd to_3rd destination type Season
## :8088 :8088 HKG :9279 Muli :5971 Season1:3324
## HKG :5947 LHR : 958 BNE :2016 Oneway:5791 Season2:4179
## PVG : 16 BNE : 941 CNS : 324 return:2327 Season3:3133
## PEK : 8 PVG : 705 LHR : 202 Season4:3453
## LHR : 4 PEK : 596 CTU : 169
## MNL : 4 ICN : 358 PVG : 163
## (Other): 22 (Other):2443 (Other):1936
## Curr.First.Dep.Date Pnr.Pax.Count Lang Pnr.Curr.Code
## 25-Nov-15: 64 Min. :1.000 GB :11486 : 0
## 12-Nov-15: 61 1st Qu.:1.000 TW : 1330 AUD:14089
## 30-Jun-15: 61 Median :1.000 CN : 1052
## 14-Nov-15: 57 Mean :1.348 KO : 113
## 21-Jul-15: 55 3rd Qu.:2.000 DE : 33
## 26-Jul-15: 54 Max. :6.000 FR : 24
## (Other) :13737 (Other): 51
## Pnr.Fare.Amt Bkg.Subclass Bkg_Subclass Net.Net.Rev
## Min. : 0.0 V :5226 V :5255 Min. : 58.54
## 1st Qu.: 845.2 L, V : 975 L :1886 1st Qu.: 389.96
## Median : 1251.6 S : 932 M :1473 Median : 584.18
## Mean : 1921.8 M : 764 K :1080 Mean : 951.99
## 3rd Qu.: 2114.7 E : 751 E :1075 3rd Qu.: 979.48
## Max. :30368.8 M, V : 670 S : 932 Max. :17846.52
## (Other):4771 (Other):2388 NA's :3724
## Sales.Promo first_ticket_O first_ticket_D
## :13961 BNE :8538 HKG :14089
## RUGBY7AU : 73 CNS :1903 ADL : 0
## AUSHKGYOW5 (May)15: 46 SYD :1710 AMS : 0
## MICE700 : 3 MEL :1254 ARN : 0
## MICE684 : 1 PER : 684 ATH : 0
## MICE697 : 1 ADL : 0 BAH : 0
## (Other) : 4 (Other): 0 (Other): 0
## Route Pnr.Creat.Pseudo.City.Code
## BNEHKG, HKGBNE :1988 SYDCX08AA:13970
## SYDHKG :1710 SYDCX08CP: 98
## MELHKG :1253 SYDCX08PA: 21
## BNEHKG, HKGBNE, HKGLHR, LHRHKG: 711
## PERHKG : 683
## BNEHKG, HKGBNE, HKGPVG, PVGHKG: 601
## (Other) :7143
There is only 14089 bookings(around 16%) paid in AUD flight from AU. Most of them is multi-way or one-way tickets. All of these bookings is flight to HKG.
create some variable
I create three variables for finding net revenue/passenger, fare/ passenger, because I am to know this information for each passenger , not for each transaction. i.e. \[netPerPerson = Net.Net.Rev/Pnr.Pax.Count\] \[farePerPerson = Pnr.Fare.Amt/Pnr.Pax.Count\]
There is many missing data in netPerPerson column. It cannot find the the ratio of fare to net revenue.
Remove all NA in Net revenue/Person and create a variable
\[fare_to_net = farePerPerson /netPerPerson\] There is so many data removed.
Remove all NA in Net revenue/Person
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.696 1.883 2.258 2.238 112.600
The ratio is mainly around 2.
Then, I am wonder why the ratio is not in same range. Which factor affect this , date of departure , booking class or type of way.( return-way/ one-way/ multi-way)
fare_to_net, by month of departure
## month_dep: .01.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.660 2.108 3.045 3.835 62.760
## --------------------------------------------------------
## month_dep: .02.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.648 1.915 2.733 3.855 25.390
## --------------------------------------------------------
## month_dep: .03.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.653 1.830 2.118 2.143 45.120
## --------------------------------------------------------
## month_dep: .04.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.640 1.787 2.055 2.057 22.370
## --------------------------------------------------------
## month_dep: .05.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.722 1.921 2.135 2.182 49.830
## --------------------------------------------------------
## month_dep: .06.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.712 1.891 2.096 2.155 24.360
## --------------------------------------------------------
## month_dep: .07.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.643 1.790 2.043 2.062 36.020
## --------------------------------------------------------
## month_dep: .08.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.698 1.859 2.073 2.140 18.630
## --------------------------------------------------------
## month_dep: .09.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.690 1.843 2.066 2.107 22.670
## --------------------------------------------------------
## month_dep: .10.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.706 1.848 2.121 2.102 112.600
## --------------------------------------------------------
## month_dep: .11.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.797 2.074 2.506 3.119 41.280
## --------------------------------------------------------
## month_dep: .12.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.485 3.894 4.321 4.399 35.950
In December , the median of ratio is the highest in a year.
fare_to_net, by booking subclass
## Bkg_Subclass: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.542 1.635 1.882 1.848 10.000
## --------------------------------------------------------
## Bkg_Subclass: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3671 1.3730 1.4620 1.7060 1.5380 5.9860
## --------------------------------------------------------
## Bkg_Subclass: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3012 1.3890 1.4500 1.5830 1.5340 12.7900
## --------------------------------------------------------
## Bkg_Subclass: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.514 1.590 1.818 1.717 17.320
## --------------------------------------------------------
## Bkg_Subclass: F
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.319 1.319 1.319 1.319 1.319 1.319
## --------------------------------------------------------
## Bkg_Subclass: H
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.591 1.692 1.970 1.934 22.670
## --------------------------------------------------------
## Bkg_Subclass: I
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.409 1.475 1.703 1.590 43.020
## --------------------------------------------------------
## Bkg_Subclass: J
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6167 1.3300 1.3900 1.7240 1.5110 7.0410
## --------------------------------------------------------
## Bkg_Subclass: K
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.656 1.783 2.120 2.105 26.620
## --------------------------------------------------------
## Bkg_Subclass: L
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.747 1.955 2.396 2.429 36.020
## --------------------------------------------------------
## Bkg_Subclass: M
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.698 1.864 2.304 2.288 112.600
## --------------------------------------------------------
## Bkg_Subclass: N
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.803 1.921 2.429 2.040 45.120
## --------------------------------------------------------
## Bkg_Subclass: Q
## NULL
## --------------------------------------------------------
## Bkg_Subclass: R
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5428 1.4490 1.5040 1.7710 1.5810 41.2800
## --------------------------------------------------------
## Bkg_Subclass: S
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.109 2.195 2.566 2.414 49.830
## --------------------------------------------------------
## Bkg_Subclass: T
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.123 1.671 1.783 2.110 1.917 8.847
## --------------------------------------------------------
## Bkg_Subclass: U
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.710 1.314 1.497 1.886 1.604 13.770
## --------------------------------------------------------
## Bkg_Subclass: V
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.774 1.950 2.352 2.314 35.800
## --------------------------------------------------------
## Bkg_Subclass: W
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.309 1.426 1.455 2.341 1.523 41.280
## --------------------------------------------------------
## Bkg_Subclass: Y
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.239 1.400 1.560 2.414 2.165 12.440
The highest median of ratio is the S class among all booking subclass.
fare_to_net, by type
## type: Muli
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.766 2.026 2.413 2.530 45.120
## --------------------------------------------------------
## type: Oneway
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.620 1.709 2.151 1.861 41.280
## --------------------------------------------------------
## type: return
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.704 1.886 2.231 2.215 112.600
The highest median of ratio is multi-way.
Then , I want to set-up a linear model to predict the net revenue from the price of fare.
Linear_Model
##
## Call:
## lm(formula = netPerPerson ~ farePerPerson, data = remove_NA_net)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9662.7 -55.0 26.2 88.1 5222.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -76.899472 2.174457 -35.37 <2e-16 ***
## farePerPerson 0.578773 0.001155 501.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 345.8 on 61962 degrees of freedom
## Multiple R-squared: 0.8021, Adjusted R-squared: 0.8021
## F-statistic: 2.511e+05 on 1 and 61962 DF, p-value: < 2.2e-16
There are 89,206 booking , which pay in AUD, in the dataset with 29 features.
[1] “Pnr.Creat.Date”(the date of create booking) [2] “month_pnr_create”(the month of create booking) [3] “Rloc” ( the reference number of booking)
[4] “Bkg.User.Id” [5] “Collect.Method” (only e-ticket)
[6] “Curr.First.Dep.Apt”
[7] “numberOfDay” ( number of days between date of create booking and date of departure)
[8] “month_dep” (the month of departure)
[9] “from_1st” (the 1st city of departure)
[10] “to_1st” (the 1st city of arrival)
[11] “from_2nd” (the 2nd city of departure)
[12] “to_2nd” (the 2nd city of arrival) [13] “from_3rd” (the 3rd city of departure)
[14] “to_3rd” (the 3rd city of arrival)
[15] “destination”
[16] “type” (one-way/return way/ multi-way)
[17] “Season” (the season of departure
[18] “Curr.First.Dep.Date”
[19] “Pnr.Pax.Count” (the number of passengers and companions in each booking)
[20] “Lang” (language shown on website)
[21] “Pnr.Curr.Code”
[22] “Pnr.Fare.Amt” (the amount of Fare in AUD)
[23] “Bkg.Subclass”
[24] “Bkg_Subclass” (the booking subclass)
[25] “Net.Net.Rev” (the net revenue)
[26] “Sales.Promo”
[27] “first_ticket_O”
[28] “first_ticket_D”
[29] “Route”
[30] “Pnr.Creat.Pseudo.City.Code”
The variables Lang, Destination, and Bkg_Subclass, are ordered factor variables with the following levels.
Lang(the language shown on website): “CN” “DE” “FR” “GB” “ID” “IT” “JP” “KO” “KR” “NL” “RE” “RU” “TH” “TW”
Destination: “ADL” “AMS” “BKK” “BLR” “BNE” “BOM” “BOS” “CAN” “CCU” “CDG” “CEB” “CGK” “CGO” “CJU” “CKG” “CMB” “CNS” “CNX” “CPT” “CRK” “CSX” “CTS” “CTU” “DAC” “DAD” “DEL” “DFW” “DLC” “DME” “DOH” “DPS” “DUR” “DUS” “DXB” “EWR” “FCO” “FOC” “FRA” “FUK” “HAK” “HAN” “HGH” “HKG” “HKT” “HND” “HYD” “ICN” “JFK” “JMU” “JNB” “KHH” “KIX” “KMG” “KTM” “KUL” “KWL” “LAX” “LHR” “MAA” “MAD” “MAN” “MEL” “MLE” “MNL” “MXP” “NGB” “NGO” “NKG” “NRT” “OKA” “ORD” “PEK” “PER” “PNH” “PUS” “PVG” “RMQ” “RUH” “SFO” “SGN” “SHA” “SIN” “SYD” “SYX” “TAO” “TPE” “TSN” “VVO” “WNZ” “WUH” “XIY” “XMN” “YVR” “YYZ” “ZRH”
Bkg_Subclass : (best) —————-> (worst) 1) First Class (Flex): F, 2) Business Class : J, C, D, I 3) Premium Economy : W, R, E 4) Economy Class (Standard/flex) : Y, B, H, K, M, L, V 5) Economy Class(Special) : S, N, Q 6) Economy Class(Supersaver): T,U
Other observations: Most passengers buy return tickets flight from HKG in English version. For each transaction , the median of the number of passenger is one. The amount of fare from AUD 0 to AUD 33,517.4 There is 61505 is return ticket. Some of transaction duplicate that the number of RLOC is more than one.
The main features in the data set are the amount of the fare and net revenue, booking subclass, month of create booking/departure and season . I would like to look into what factors will increase passengers and net renvenues. I suspect if peak season/month and booking subclass are the main factor to predict this data.
Also , departure city (1st , 2nd and 3rd ) , arrival city (1st , 2nd and 3rd ), type of tickets and language shown on website likely contribute to the number of passengers. I think language and type of tickets probably contribute most to the number of passengers after researching information on the dataset.
I create three variables for finding net revenue/passenger, fare/ passenger, because I am to know this information for each passenger , not for each transaction. i.e. \[netPerPerson = Net.Net.Rev/Pnr.Pax.Count\] \[farePerPerson = Pnr.Fare.Amt/Pnr.Pax.Count\]
I also create the ratio of fare to net revenue. i.e. \[fare_to_net = farePerPerson /netPerPerson\]
Suppose the date of booking is earlier than date of flight and the currency is AUD, i.e.numberOfDay >= 0 and Pnr.Curr.Code==‘AUD’ .Also , there is no duplicated RLOC(duplicated data). 850 rows is removed from the dataset. After cleared data, there is 88356 rows
There are too much missing data in the net revenue ( 26605 is NA ). Therefore , only relevant to net revenue will use cleared data set.(remove_NA_net)
Correlation_Matrix
## numberOfDay netPerPerson farePerPerson fare_to_net
## numberOfDay 1.00000000 -0.09984963 -0.04312767 0.19311712
## netPerPerson -0.09984963 1.00000000 0.89558071 -0.24283984
## farePerPerson -0.04312767 0.89558071 1.00000000 0.02418034
## fare_to_net 0.19311712 -0.24283984 0.02418034 1.00000000
Net revenue/person by booking subclasss
The median of net revenue in economy class is the lowest. The better class, the variance of net revenue is increase.
Booking subclasssn by season
V class is the popular booking class in every seasons. From this graph , it can find that only a small porprotion of passengers is able afford the fare of business class.
Date of departure, booking class and type of way are the facter affect the ratio of fare to net revenue.In December , the median of ratio is the highest in a year.This result is same as expected because Christmas is in December and it is a important festival for western family.The highest median of ratio is the S class among all booking subclass. The highest median of ratio is multi-way.Normally, multi-way is the most expensive in ticket type.
It is interested to observe tht economy class is the most popular and the median of net revenue in economy class is the lowest.
The price of a tickets is positively and strongly correlated with net revenue.The fare could be used in a model to predict the the net revenue in each booking which show perfect correlation.
Price_Histograms_by_Qualitative_Factors
These density plots explain the odd trends that were seen in the box plots earlier. There is higher variances in lower fare/passenger with different levels of season, booking class and type.
fare_to_net in Season
## Season: Season1
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.653 1.851 2.439 2.658 62.760
## --------------------------------------------------------
## Season: Season2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.689 1.864 2.097 2.146 49.830
## --------------------------------------------------------
## Season: Season3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.674 1.828 2.060 2.105 36.020
## --------------------------------------------------------
## Season: Season4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.770 2.021 2.566 3.245 112.600
Wow! Only season 4 has highest median for the ratio having the largest variation. Season2 and Season 3 have the least variation.
fare_to_net in booking class
## Bkg_Subclass: B
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.542 1.635 1.882 1.848 10.000
## --------------------------------------------------------
## Bkg_Subclass: C
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3671 1.3730 1.4620 1.7060 1.5380 5.9860
## --------------------------------------------------------
## Bkg_Subclass: D
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3012 1.3890 1.4500 1.5830 1.5340 12.7900
## --------------------------------------------------------
## Bkg_Subclass: E
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.514 1.590 1.818 1.717 17.320
## --------------------------------------------------------
## Bkg_Subclass: F
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.319 1.319 1.319 1.319 1.319 1.319
## --------------------------------------------------------
## Bkg_Subclass: H
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.591 1.692 1.970 1.934 22.670
## --------------------------------------------------------
## Bkg_Subclass: I
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.409 1.475 1.703 1.590 43.020
## --------------------------------------------------------
## Bkg_Subclass: J
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6167 1.3300 1.3900 1.7240 1.5110 7.0410
## --------------------------------------------------------
## Bkg_Subclass: K
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.656 1.783 2.120 2.105 26.620
## --------------------------------------------------------
## Bkg_Subclass: L
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.747 1.955 2.396 2.429 36.020
## --------------------------------------------------------
## Bkg_Subclass: M
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.698 1.864 2.304 2.288 112.600
## --------------------------------------------------------
## Bkg_Subclass: N
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.803 1.921 2.429 2.040 45.120
## --------------------------------------------------------
## Bkg_Subclass: Q
## NULL
## --------------------------------------------------------
## Bkg_Subclass: R
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5428 1.4490 1.5040 1.7710 1.5810 41.2800
## --------------------------------------------------------
## Bkg_Subclass: S
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.109 2.195 2.566 2.414 49.830
## --------------------------------------------------------
## Bkg_Subclass: T
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.123 1.671 1.783 2.110 1.917 8.847
## --------------------------------------------------------
## Bkg_Subclass: U
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.710 1.314 1.497 1.886 1.604 13.770
## --------------------------------------------------------
## Bkg_Subclass: V
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.774 1.950 2.352 2.314 35.800
## --------------------------------------------------------
## Bkg_Subclass: W
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.309 1.426 1.455 2.341 1.523 41.280
## --------------------------------------------------------
## Bkg_Subclass: Y
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.239 1.400 1.560 2.414 2.165 12.440
The discount ticket in economy class have the lowest median ratio. This is such an usual trend.
fare_to_net in type
## type: Muli
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.766 2.026 2.413 2.530 45.120
## --------------------------------------------------------
## type: Oneway
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.620 1.709 2.151 1.861 41.280
## --------------------------------------------------------
## type: return
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.704 1.886 2.231 2.215 112.600
I guess the median of ratio in one-way is the largest, it is different to my expect. The multi-way is the highest median among the type having the largest interquatile range.
It looks like return-way ticket is the popular type. Different type has similar range of ratio.
These plots support the variability and trends that the boxplots showed from before. I am going see which variables correlate with net revenue.
It seems that there is no strong relationship between the ratio of fare to net and the number of days. For season1 , mostly, they plan the trip before 3 months, but for other season , they likely buy it in shortly .There is no clear trend, pattern or observation in booking subclass and type.
The multi-way is the highest median among the type having the largest interquatile range. Only season 4 has highest median for the ratio having the largest variation. Season2 and Season 3 have the least variation. The discount ticket in economy class have the lowest median ratio. This is such an usual trend. It seems that there is no strong relationship between the ratio of fare to net and the number of days. For season1 , mostly, they plan the trip before 3 months, but for other season , they likely buy it in shortly .There is no clear trend, pattern or observation in booking subclass and type.
The multi-way is the highest median among the type having the largest interquatile range. Only season 4 has highest median for the ratio having the largest variation. Season2 and Season 3 have the least variation. The discount ticket in economy class have the lowest median ratio. This is such an usual trend. ### OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.
Passengers prefer buy lowest price of tickets,however , the median of net revenue is also lowest.The proportion of season2 in almost every booking subclass is the highest, passengers likely go to travel in Season 2.
### Description Two
It find that the relationship between months and price of fare. For the peak periods, the price will be increse with higher variance.The mean of fare is under AUD 1000 in whole year.(except January) and the all distribution of fare is positive skew.
### Description Three
This graph shows the relationship between fare and net revenue booking subclass.Obviously ,the price of fare is higher, the net revenue is higher. Also, there are two recognized lines on the graph that means there are at least two formula for calculate fare by net revenue for different situations.
The data set contains booking information on almost 90 thousand transactions from around 2015. I started by understanding the individual variables in the data set and created a linear model to predict net revenue of ticket, and then I explored interesting questions and leads as I continued to make observations on plots. Eventully , I explored the number of passengers and the amount of net renvenue/fare per passenger across many variables. At first , I was wonder why so many bookings created on February and October, it is because the tickets in these two months is the cheapest in a year.It is easy to understand that the booking class is mainly class V/L/S/M/K , which concentrate on enconomy class with lowest median of fare.Also, it is strong demand for travel originating from HongKong ,benefited from the weakness of the Euros and the Australian dollars in the first half of 2015.It reflects demand on regional routes is strong, particular in economy class. There was strong economy class demand on long-haul routes.For this, I am strongly recommend increase flights to popular destination over the peak period.Also, using larger aircarft such as Boeing 777-300ER on popular flight a day will increase capacity.